New clusterization of global seaport countries based on their DEA and FDEA network efficiency scores

Global seaport network efficiency can be measured using the Liner Shipping Connectivity Index (LSCI) with Gross Domestic Product. This paper utilizes k-means and hierarchical strategies by leveraging the results obtained from Data Envelopment Analysis (DEA) and Fuzzy Data Envelopment Analysis (FDEA) to cluster 133 countries based on their seaport network efficiency scores. Previous studies have explored hkmeans clustering for traffic, maritime transportation management, swarm optimization, vessel trajectory prediction, vessels behaviours, vehicular ad hoc network etc. However, there remains a notable absence of clustering research specifically addressing the efficiency of global seaport networks. This research proposed hkmeans as the best strategy for the seaport network efficiency clustering where our four newly founded clusters; low connectivity (LC), medium connectivity (MC), high connectivity (HC) and very high connectivity (VHC) are new applications in the field. Using the hkmeans algorithm, 24 countries have been clustered under LC, 47 countries under MC, 40 countries under HC and 22 countries under VHC. With and without a fuzzy dataset distribution, this demonstrates that the hkmeans clustering is consistent and practical to form grouping of general data types. The findings of this research can be useful for researchers, authorities, practitioners and investors in guiding their future analysis, decision and policy makings involving data grouping and prediction especially in the maritime economy and transportation industry.


Introduction
Maritime shipping industry is keen with machine learning development as it can help the sector with container freight customization as well as to overcome daily problems in seaport operations.Tay et al. [1] claimed that machine learning approach is easily favoured to achieve operational efficiency and productivity as it can enhance fuel efficiency in harbour vessels.
Moreover, machine learning is commonly used to estimate the travel time especially when there are congestions at the seaport.
Clustering is one of the machine learning applications that is widely used in many fields such as applied sciences, military intelligence, forensic data science, computational biology, bioinformatics, business and marketing, computer science and social science.It is a strategy that conveys information in significant clusters for the purpose of data grouping.K-means is one of the famous clustering algorithms which is broadly used since it minimizes the squared distance between two points within the same cluster [2].K-means algorithm is superiorly applied based on the initial selection of the k-means center for more accurate and meticulous clustering.According to Dhamecha [3], k-means clustering algorithm progresses in large dataset applications through minimization of the total squared error for accuracy improvement.Just like other typical numerical methods, as the number of iteration increases, the computation time will increase as well in the k-means algorithm [4].
According to Lukauskas and Ruzgas [5], regardless of the fact that there are numerous clustering methods, the subject addressed remains as a complex matter.There is a great need for alternate procedures because typical clustering algorithms do not commonly work well with all types of datasets.Despite being one of the most common algorithms for rapid and successful implementation with certain sorts of data, there are still ample rooms for improving the accuracy of hierarchical clustering strategies.In fact, there are numerical values to indicate the level of similarity between two different hierarchical strategies when comparing them.These numerical figures are beneficial for evaluating the existing hierarchical clustering strategies [6].On the other hand, recent developments have made vessel trajectory prediction one of the most important areas to optimize maritime transportation safety, intelligence and efficiency.It provides an up-to-date evaluation of available methodologies for vessel trajectory prediction which include the state of the art deep learning [7].Hence, further improvement on the kmeans, hierarchical and hybrid hierarchical clustering techniques are important to shape this state of the art deep learning for future smart maritime prediction.
The existing literatures revealed that majority of studies did not address hierarachical kmeans (hkmeans) clustering strategy in grouping seaport network efficiency scores.As a result, the present study ventures on using the three different machine learning approaches to determine which clustering method is the most suitable for global seaport network efficiency clustering.This study introduces hkmeans where despite the algorithm itself is not new, its application in the seaport network efficiency assessment based on Liner Shipping Connecting Index output is new for 133 countries that are presently considered.This study contributes significantly to maritime research by extending the analysis and providing a comprehensive understanding of the relationship between connectivity and global economic stability which fills the literature gap in the maritime transportation industry.The study's novel contribution is also highlighted with the introduction of four new clusters defined as low connectivity (LC), medium connectivity (MC), high connectivity (HC) and very high connectivity (VHC) for the seaport network efficiency clustering while discovering the behavior of new cluster dendrograms and new cluster plots with the application of the hkmeans algorithm.
Currently the data on seaport network efficiency are uncertain due to the real-world data fluctuations in maritime industry.Secondly, the clustering strategy outcome can be affected by the fluctuated data hence the result interpretation can be misled.With existing limitations in hierarchical and k-means strategies as well as utilization of fuzzy data to treat the uncertain data, these have become the motivation of present research.Other than introducing the four new clusters to group the global 133 countries (previous studies have been done on several individual countries and up to 10 top ports in Southeast Asia using k-means but none was on hkmeans), the importance of the present study lies on the finding that application of the hkmeans clustering strategy alone can treat the uncertain data issues, with or without fuzzy approach.Additionally, this study is important since it offers insights for better maritime data analysis, investment planning, port operations and supply chain improvements through the seaport network efficiency, while promoting sustainability and progress in economic and societal realms.The remains of the paper are arranged as follows.Section 2 provides literature review of some basic concepts in k-means and hierarchical strategies as well as for the hkmeans whereas Section 3 explains the materials and methodology of this study.Section 4 provides the findings and empirical analysis of all the clustering techniques (k-means, hierarchical and hkmeans) applied on DEA and FDEA scores of the seaport network efficiency with result comparisons between all these clustering strategies (for DEA and FDEA) are provided in Section 4.5.Finally, Section 5 concludes the overall findings of this research.

Literature review
A common statistical data analysis technique called clustering is used in many fields, including bioinformatics, machine learning, image analysis, data mining and pattern recognition.Data are divided into smaller groups before they are sorted according to a distance metric within the subgroups.There are two different kinds of clustering algorithms: partitional and hierarchical clustering strategies.K-means clustering is one of the partitional methods that assigns each data element to a unique cluster and the clusters do not overlap.Several clusters are subsets of other clusters in accordance with the hierarchical strategy.These can be agglomerative (from the bottom up) or divisive (from the top down).In light of these, hkmeans is a solution using a hybrid approach by combining the hierarchical algorithm with the partitional k-means algorithm to improve the initial non-overlapping clusters of data.
K-means is a clustering approach that is used when the data is unlabelled and it utilizes the unsupervised machine learning method [3].Clustering of fuzzy data by virtue of the k-means algorithm can be developed in the first stage to suit a cluster with similar characteristics.On the other hand, hierarchical clustering is widely used in marine traffic, pollution level, carbon dioxide emission, collision risk, waterway limit and economy competitiveness evaluation.Hierarchical clustering can be initiated based on a density function with linking algorithms.The hierarchical algorithm contains layers of grouping which adopt the unsupervised clustering approach.
Hybrid hierarchical k-means clustering, also known as hkmeans clustering, is widely used in medical industry such as in treating Eisen's yeast microarray data, protein sequence in bioinformatics field, gene expression and in many more applications but never in maritime transportation industry for seaport network efficiency.According to Liu et al. [8] and Liu et al. [9], the involvement of hierarchical clustering with k-means algorithm in sound speed profile delivers a new method for reforming the geometric model of the sea network with different ranges.This hierarchical k-means clustering is set up to overcome the innate disadvantages such as the inability of the standard hierarchical clustering to distinguish comparable cluster patterns.In maritime transportation, the proposed cluster has been utilized to treat highdimensional historical data for modelling the vessels' behaviour [10].
Chang et al. [11] show that few countries are influencing the efficiency of another country.Hierarchical cluster analysis is used to identify trading blocs and shipping blocs based on bilateral trade intensity and liner shipping connectivity.Therefore hierarchical clustering can be smartly performed along with the applied k-means algorithm based on each country's Liner Shipping Connectivity Index (LSCI).Initially, a particular group of countries representing their liner shipping connectivity tends to stay within their own cluster where the distance of the closest factor of interest has been checked and finally, all these countries are linked together to decide the existence of possible similar partnerships between them.A tree diagram, also known as dendrogram, can be used to represent this long chain projection of the countries' prospective separate clusters [11].
Abdulrazzak et al. [12] illustrated the feature-reduction capabilities of the k-means clustering approach.This algorithm may be started without knowing how many clusters there really are.The study contributes parameters to the model, resulting in a more successful clustering strategy that can determine the optimal number of clusters and perform feature reduction of new hybrid clustering techniques for vehicular ad hoc network.The development of globally connected clusters will improve the high-speed railway system's transport network efficiency.The performance provided by the high-speed railway system can reduce travel time and expenses [13].Wang et al. [14] mainly focused on cluster distribution of nodes in accordance with vectors produced after two layers of Graph Convolutional Network (GCN) was initiated.Rozar et al. [15] decided to utilize the k-means method to conduct this investigation.In order to evaluate competitiveness, a number of performance analyses were conducted using 18 bulk terminals in Malaysia that were split into two different groups with distinction in the hierarchical clustering approaches used.
The top ten container ports in Southeast Asia may be divided into three groups using kmeans clustering.Nguyen and Woo [16] found that Singapore is still the region's leading port, despite competition from Port Klang, Tanjung Pelepas and Saigon Newport.A port must have stronger connections to other container ports and higher container throughput in order to be recognized as a hub port [17].This shows that, although k-means clustering has been used in maritime transportation, 10 ports are very less as compared to present's 133 countries' ports in global hub port clustering study.The hkmeans clustering approach has been used to cluster typical scenarios of the island power supply system [18].Only a limited number of study has been done on hkmeans clustering and that too was very far from the topic concerned presently on global seaport network efficiency clustering.Recently, a clustering algorithm with features and robust scaling for clustering ship AIS data derived using Hausdorff distance and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), was suggested by Wang et al. [19].According to Andrade et al. [20], the top five most efficient ports are those with the highest cargo throughput and it shows a significant link between cargo throughput and port efficiency rating among Brazilian ports.The clustering algorithm classified the Brazilian ports into three categories: efficient, moderately efficient and inefficient.This again shows that the study was conducted only for a single country's ports and the outcome gives three efficiency clusters.
Martinez-Budrai et al. [21] used DEA scores of 26 Spanish port authorities to divide the ports' levels of complexity into three categories.Following this, Quresma Dias et al. [22] focussed on 10 Iberian Peninsula container terminals while Guironnet et al. [23] examined technical efficiency of 24 Italian and 13 French ports using DEA and clustered the ports into geographical grouping.Similarly, Sharma and Yu [24], Koster et al. [25], Cheon [26], Cullinane and Wang [27], Wu and Goh [28], Cheon et al. [29] and Bichou [30] used DEA to assess technical efficiency of 70, 38, 110, 25, 21, 98 and 60 global container terminals, respectively.Afterwards, terminal clusters obtained from Serviceable Obtainable Market (SOM) and local competition were grouped based on ownership and corporate change by Cheon et al. [29].Tovar and Rodrguez-De ´niz [31] clustered 26 Spanish port authorities using the dendrogram cutoff in hierarchical clustering.The present literature survey reveals that all the past studies predicted technical efficiency using the DEA model and only two studies utilized hierarchical clustering.
The results, based on Zhanjiang Port [9], show that the hybrid clustering technique can effectively cluster ship trajectories and provides categorization of ship traffic.Yet, the effectiveness of the seaport network based on LSCI and GDP output has never been studied using hkmeans clustering.The majority of researches done had focused on traffic, maritime transportation management, swarm optimization, vessel trajectory prediction, vessels behaviours, vehicular ad hoc network etc., but there has not been a single clustering work on the effectiveness of the seaport network by comparing various strategies (k-means, hierarchical and hkmeans) using four presently defined clusters (LC, MC, HC, VHC).The absence of accuracy has initiated a combination of DEA model with fuzzy set theory where it results in FDEA data to Tackles the existence of outputs deemed undesirable [32].In order to leverage these efforts, the present paper proposed the hybrid hkmeans strategy in clustering the seaport network efficiency scores of 133 countries obtained from DEA and FDEA where comparisons have been done between the results of k-means, hierarchical and hkmeans techniques.Since hkmeans clustering on seaport network efficiency based on LSCI and GDP output was never done in the past, it creates motivation for the present study.Moreover, the introduction of the four new level clusters with different specifications through this research is important for the global maritime industry as the findings on seaport network efficiency contribute towards the country's efficiency, hence the country's economic growth.
The present research's main contributions are clarified as follows: 1.This study introduces four new level clusters specified as low connectivity (LC), medium connectivity (MC), high connectivity (HC) and very high connectivity (VHC) in clustering the seaport network efficiency scores.The nearest study by Andrade et al. [20] categorized leading Brazilian ports based on cargo throughput efficiency, dividing them into only three groups (highly efficient, moderately efficient and inefficient).
3. This study clusters 133 global seaport countries which is the highest number of countries considered in similar research area.Previously Nguyen and Woo [16], employed k-means cluster analysis for top 10 Southeast Asian ports' countries.
4. This study applies k-means and hierarchical strategies as well as recommends the third strategy, hkmeans (hierarchical k-means) for seaport network efficiency clustering.In the nearest past study, k-means strategy has been applied in social network analysis within the maritime transportation context, focusing on the top 10 Southeast Asian ports' [16].
5. The present study implements LSCI and GDP output in both DEA and FDEA computations prior to the clustering application which was never been carried out before.Previously, Chang et al., [11] has only used LSCI to perform hierarchical clustering strategy.

Data sources and variables
The seaport network efficiency scores are calculated based on four input variables (time in port, age of vessels, size of vessels and cargo carrying capacity) and two output variables (gross domestic product (GDP) and liner shipping connectivity index (LSCI)) of 133 global seaport countries listed in Table 1.The input variables are collected from United Nations Conference on Trade and Development statistics (UNCTADstat) and the output variables are from World Development Indicators (WDI).The links to the data source of the UNCTADstat and WDI are provided under this work's data availability statement.In this study, data from 133 countries with seaports was collected, comprising input and output variables to assess the efficiency of the seaport networks.This assessment was conducted using both Data Envelopment Analysis (DEA) and Fuzzy Data Envelopment Analysis (FDEA), incorporating triangular fuzzy number theory.The study utilized MaxDEA software to compute seaport network efficiency scores.In a prior study [33], it was established that when dealing with both real and fluctuating data, the utilization of triangular fuzzy numbers is proven to be more proficient than trapezoidal fuzzy numbers for calculating efficiency scores for FDEA.Firstly, a linear programming (LP) problem is formulated as follows [33]: subject to: To make this LP viable for DEA, Eq (1) until Eq (3) are reformulated as follows: subject to: On the other hand, Eq (1) until Eq (3) can also be modified to allow rooms for fuzzy numbers with inclusions of L (minimum value), A (mean value) and M (maximum value) to form the following LP: subject to Similarly, Eq (8) until Eq (10) can be reformulated to fit FDEA as follows: The results of DEA (from Eq (4) until Eq (7)) and FDEA (from Eq (11) until Eq (15)) based on 3-year available public data (2018-2020) are then used to perform the clustering.Three clustering strategies are explored in this current work; k-means, hierarchical and hierarchical k-means (or hkmeans).These algorithms are coded in RStudio software using R-programming to construct four new clusters for grouping the 133 countries based on their seaport network efficiency data of DEA and FDEA obtained previously.Further elaboration and comparison between the three different clustering strategies are presented in the next sections following the stepwise manner.
Fuzzy Data Envelopment Analysis (FDEA) is a method used to assess the efficiency of decision-making units (DMUs) when dealing with uncertain or imprecise data [34].Its advantages include: 1. Handling uncertainty: FDEA accommodates uncertain data by allowing for degrees of membership, making it suitable for situations with incomplete or noisy data 2. Flexibility: It provides flexibility in modeling input-output dynamics, encompassing diverse performance-contributing factors, particularly in scenarios where quantification presents challenges.
3. Robustness: FDEA is robust against outliers and extreme values, producing more reliable efficiency scores, especially in complex systems with variable data.
4. Accounting for subjectivity: It captures subjective judgments and expert opinions, incorporating qualitative factors into evaluations.
5. Interpretability: FDEA provides easily interpretable results, identifying relative efficiency of DMUs and areas for improvement.
FDEA was chosen for seaport network efficiency in this study because it can handle uncertain data, common in maritime contexts.It incorporates fuzzy set theory, allowing for alternate evaluations of efficiency amidst the complexities and uncertainties of the data.

K-means clustering.
In this section, the step-by-step procedures to perform kmeans clustering are briefed.There are four steps to conduct the k-means algorithm [35].
Step 1: Determination of the k-value: A number of clusters to be used in the study is selected randomly as the underlying initial cluster communities.
Step 2: Finding the nearest centroid: The nearest centroid is based on the Euclidean distance between the observation and the centroid.The Euclidean distance between two points a(x 1 , y 1 ) and b(x 2 ,y 2 ) is given as in Eq (16): dða; bÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Step 3: For each k-means cluster, a new mean value of all data considered is recalculated using Eq (17) where Pi is the set of all observations allocated to the i-th cluster: Step 4: Steps 2 and 3 are repeated until the total sum of squares is minimized and the centroids are no longer changed or the maximum iteration has been reached.

Elbow method.
An essential component of this approach is to determine the appropriate number of clusters.Elbow method is a widely used technique for determining the appropriate k-value [35].The elbow approach is a heuristic method commonly used in cluster analysis to estimate the number of clusters present in a dataset.Plotting the explained variation as a function of the number of clusters, the procedure entails towards choosing the elbow of the curve as the appropriate number of clusters.

Hierarchical clustering.
The hierarchical clustering is performed as the second objective of this study.This strategy measures the distance to generate new clusters.The procedures are branched into 5 steps [35].
Step 1: The distances between each pair of points using a distance metric is determined.
Step 2: Each data point is assigned to a cluster.
Step 3: The grouping is constructed based on close similarity between one another.
Step 4: The distance metric is refreshed.
Step 5: Step 3 and 4 are repeated until a single cluster is obtained.

Hierarchical k-means clustering. The hkmeans strategy is carried as follows [35]:
Step 1: Hierarchical clustering is performed.
Step 2: K-clusters are divided by cutting the tree.
Step 3: The closest centroid is determined by averaging each cluster.
Step 4: K-means algorithm is performed by using the initial cluster centers from the set of centroids calculated in Step.
Fig 1 illustrates the step-by-step data collection process leading to clustering.Initially, four input variables were gathered from UNCTADstats, along with two outputs from WDI.Subsequently, the data underwent a screening to eliminate outliers for normalizing the data.Once the screened data was ready, Data Envelopment Analysis (DEA) will be executed using Max-DEA.Afterward, the screened data was utilized for Fuzzy Data Envelopment Analysis (FDEA), involving data fuzzification to generate Triangular Fuzzy Number (TFN), followed by defuzzification to derive the ultimate FDEA scores.To cluster countries by efficiency, both DEA and FDEA scores were subjected to clustering through k-means and hierarchical strategies using R-programming.Additionally, for addressing fluctuations in maritime data, the hkmeans clustering strategy was applied.The research will be concluded by comparing the outcomes obtained from all the three clustering methods considered.

Outlier detection
The work starts with outlier identification since clustering is very sensitive to outliers and clustering can only be done if the data is free from any outliers.Therefore, the specific box plot is drawn to check the outlier as shown in Fig 2 .There are no outliers identified in the results of DEA and FDEA from the boxplot, hence it proves the inexistence of any extreme value.

K-means clustering results
K-means clustering is performed by leveraging the seaport network efficiency scores obtained from DEA and FDEA.Further analysis and comparison between the two datasets can be performed after the k-value is determined prior to finding the nearest centroid.K-means clustering is developed in this study where it calculates the sum of square and the average of distance of points in the seaport network efficiency.
The present study utilizes the elbow method which can guide the way to find the best kcluster value of the data.A plot is developed with a number of clusters and sum of square The gap statistics for 2020 reveal that the number of clusters for DEA can be 3, 4, 5 or 6 but the gap statistics for FDEA are 3 or 4.This is an undesirable output with the use of DEA as compared to FDEA that might be routed from misleading interpretations caused by uncertainty and fluctuation in real-world data, particularly during COVID-19 pandemic.Therefore, it is evident from here that the k-means strategy is more sensitive for the FDEA dataset than the DEA dataset due to the fuzziness contribution.Hence, it can be emphasized that the k-means strategy provides better clustering for fuzzy data distribution.On the other hand, data from 2020 are irregular due to the impact of the COVID-19 outbreak on the seaport network efficiency scores.
Based on the variations in 2018, 2019 and 2020, all the three-year plots show that the number of clusters of 3 and 4 are optimum.Following this result, this research will consider 4 clusters in each clustering approach on DEA and FDEA datasets.These 4 new level clusters are now specified as low connectivity (LC), medium connectivity (MC), high connectivity (HC) and very high connectivity (VHC).

Hierarchical clustering results for 2018-2020
Hierarchical clustering is another method that can cluster a set of data into groups.It is repetitively performed using two steps.The first step is to identify the two clusters that are closest together and it continues with combining the two most alike clusters.The graphs comparing DEA and FDEA for 2018, 2019 and 2020 are shown in Figs 6-8 respectively.Here the hierarchical clustering package in RStudio has automatically set 'black' as the default colour to denote the dendrogram branches and the country numbers.The hierarchical clustering starts with each country number (point) assigned to a separate cluster.The cluster is performed by combining the nearest clusters into a bigger cluster until it gives the four nearest clusters that can be displayed using the dendrogram.The cluster dendrogram shows the data points in the x-axis and in the y-axis represent the distance between the clusters.The line with green colour represents the domain for each cluster.Hierarchical cluster is a decision tree that divides the

Hierarchical k-means (Hkmeans) clustering results
This study further explores hkmeans strategy to optimize the clustering outputs for the seaport network efficiency scores.The novelty of this work is because there is no study in maritime industry that uses hkmeans strategy in the clustering of 133 global countries' seaport network efficiency scores.This hkmeans clustering strategy is proposed due to the drawbacks in conventional k-means and hierarchical algorithms that produce variation of results in the  clustering algorithm.Moreover, Turkey is classified under HC and VHC cluster for DEA and FDEA datasets respectively.Note that only the clustering results of these 6 countries changed with the hkmeans algorithm.The clustering results for other countries remain under the same clusters with improved accuracy in cluster prediction through the integration of the hkmeans clustering technique.With and without a fuzzy dataset distribution, this demonstrates that the hkmeans clustering is consistent and practical to form grouping of general data types.Hence the hkmeans strategy is an appropriate tool for the seaport network efficiency clustering.The graph shows the difference according to the clusters; LC, MC, HC and VHC based on the seaport network efficiency obtained in the previous analysis.From the figure, the MC has the highest frequency among all the clusters.The y-axis is representing the frequency of the countries involved in this study and x-axis represents the cluster categories that are used in this study.It is observed that the results of k-means for DEA and FDEA data are approximately similar as opposite to hierarchical results for the two datasets.Moreover we found that the hierarchical clustering results also show a bit of fluctuation in 2018 and 2020 for the seaport network efficiency which indicates that the hierarchical clustering strategy is not stable as compared to the  refining it with the k-means strategy.Examining the differences and connections among the clusters created by each algorithm allows a deeper understanding of the fundamental patterns within the dataset.These methodologies equip insights needed to make informed decisions about choosing the most suitable clustering approach for a specific application [38].

Differences in hierarchical, k-means and hkmeans clustering results
Hkmeans method is firstly conducted by employing the hierarchical method to determine the k-value where the tree is cut into clusters.There are four seaport network efficiency clusters; LC, MC, HC and VHC represented by four coloured main tree branches as depicted in Figs 11 and 12 respectively.Under these clusters, the numbers representing the seaport countries are classified based on their seaport network efficiency level.The dendrogram of hierarchical algorithm is marked with purple, blue, green and red colours, whereas the hkmeans dendrogram is displayed in black, green, red and blue colours to represent MC, LC, HC and VHC clusters respectively in both Figs 11 and 12.
Based on Fig 11, a few countries under the hierarchical diagram have been moved from MC cluster to HC cluster (red) after utilization of the hkmeans method where these countries are Belize, Grenada, Cameroon, Philippines, United Arab Emirates, Angola, Brunei Darussalam and Saudi Arabia.Cambodia is the only country that has been moved from MC cluster of the hierarchical method to VHC cluster (blue) of the hkmeans method.From the hierarchical LC cluster (green), the majority of the countries are reassigned to hkmeans HC cluster, while the remaining countries that stay under the LC cluster through hierarchical and hkmeans strategies are Myanmar, Georgia, Solomon Islands, Guam, Latvia, Sierra Leone, Libya and Maldives.Last but not least, it is noticed that the hierarchical HC cluster has an intriguing feature such that all the countries under this cluster have been changed to VHC cluster of hkmeans, whereas all other countries from VHC cluster of the hierarchical clustering have been shifted to the LC cluster of the hkmeans clustering.In general, this figure shows how the hierarchical clustering results can be different from the results of hkmeans clustering.
Fig 12 shows the comparison between hierarchical and hkmeans clustering results for FDEA dataset, which illustrate that all countries under the hierarchical VHC cluster have been changed to the hkmeans MC cluster while all countries under the hierarchical HC cluster have been changed to the hkmeans VHC cluster except for Guatemala.Besides that, Micronesia, Mozambique, Myanmar, Georgia, Solomon Islands, Guam, Sierra Leone, Togo, Libya and Maldives are transferred to the hkmeans MC cluster from the hierarchical LC cluster while other countries under the hierarchical LC cluster remain in the same cluster even after the utilization of hkmeans strategy.Regarding the countries under the hierarchical MC cluster, all of them have been changed to either LC or HC cluster under the hkmeans strategy.) shows the cluster plot for FDEA in which the clusters are displayed in two-dimensional space.Dim 1, a new variable that accounts for 92.7% of the variation, relates to the horizontal dimension, while Dim2, which accounts for 5.2% of the variation, corresponds to the vertical axis.This accounts for 97.9% of the total variation.This  8, the k-means algorithm has an issue in determining the k-value, whereas the hierarchical method overestimates the clustering, which does not provide a good conclusion based on the diagrams.These problems can be treated by firstly implementing the hkmeans clustering, where the hierarchical strategy is used to determine the kvalue, then proceeding with the k-means strategy to generate the data clusters.Hence, this study highlights the significance of the hkmeans clustering technique to improve the drawbacks in individual partitional k-means and hierarchical algorithms so that better clustering results in terms of consistency for general data types and non-overlapping data composition for each cluster can be produced.
Table 2 displays the composition of 133 countries grouped under present four clusters of seaport network efficiency using k-means, hierarchical and hkmeans strategies imposed on DEA and FDEA datasets.It shows that the results of the k-means strategy are exactly similar between the DEA and FDEA datasets with 26 (19.55%) and 41 (30.83%) have the same value for very high connectivity (VHC) and low connectivity (LC) respectively.Some countries are clustered under medium connectivity (MC) and high connectivity (HC) with 52 (39.10%) and 14 (10.53%) for DEA whereas for FDEA it is clustered with 14 (10.53%) and 52 (39.10%) respectively which highlight the difference in the clustering.The results in this table are calculated by combination of the three years of 2018-2020 at once which is different than the yearly individual analysis done in Fig 10.
Through hierarchical clustering, it shows that 50 (37.59%)and 55 (41.35%) countries are clustered under LC for DEA and FDEA.There are significant differences in the hierarchical clustering where 56.63% and 0.75% of the countries are categorized under HC and VHC clusters for both DEA and FDEA datasets, respectively.Moreover, in comparison with the kmeans and hkmeans strategies from Table 2, it is evident that the hierarchical clustering strategy produces the least composition of countries under the MC cluster with 9.02% and 5.26% for DEA and FDEA, respectively.This demonstrates that the hierarchical strategy might not be the best tool to cluster the countries associated with the seaport network efficiency due the overall imbalance composition of countries under the resulting clusters.
The percentages in Table 2 show the hkmeans clustering results with the country composition percentages of 15.79% (LC), 36.84%(MC), 30.08% (HC) and 17.29% (VHC) for DEA while 18.05% (LC), 35.34% (MC), 30.08% (HC) and 16.54% (VHC) are for FDEA.Comparing with the k-means and hierarchical clustering results, the overall country compositions under the four new seaport network efficiency clusters through the hkmeans strategy are the most balanced with minimal variation between the regular and fuzzy data distributions.This suggests the hkmeans strategy as the most recommended tool for the global seaport network efficiency clustering.
Table 3 highlights summary of the four clusters using the hkmeans strategy for FDEA dataset.Comparing with the hkmeans clustering results for DEA, the difference is minimal with only six countries namely, Brunei Darussalam, Conga, Latvia, Sierra Leone, Solomon Islands and Turkey are categorized under different cluster in DEA while the remaining 127 countries remain in the same cluster for DEA and FDEA using the hkmeans strategy.This table is selectively produced over FDEA, as a sample outcome of the hkmeans clustering method when dealing with fuzzy involvement in the dataset that may represent the real fluctuated raw data as influenced by the pandemic, economic, social, political or environmental factors.
The k-means strategy excels when dealing with clusters that are approximately spherical and have a similar number of data points.In contrast, the hierarchical clustering strategy stands out because it eliminates the need to determine the cluster count in advance, making it ideal for situations where the optimal number of clusters is uncertain.On the other hand, the hkmeans strategy proves its worth in handling intricate data scenarios.It is effective when addressing data with clusters at varying scales or when there's a need to explore both local and global structures within the data.In this study, four novel clusters which are low, medium, high, and very high connectivities (LC, MC, HC and VHC) have been introduced to better comprehend a country's efficiency category among the global seaport countries.The adoption of the hkmeans strategy in this specific context that was previously unexplored, has revealed new opportunities and potential outcomes for sustainable future works.Hopefully, these discoveries can empower effective decision-making and policy formulation especially the maritime industry framework.The profound insight from the empirical results of the countries grouped under LC, is that their geographical locations are isolated or restricted from major transportation routes access because most of the countries are islands.Being situated in the remote and secluded areas exposed the countries to naturally challenging terrains and insufficient infrastructure investment opportunities in transportation networks.Consequently, such countries may encounter difficulties in establishing robust connections to global markets through seaports, airports or extensive road and rail systems, resulting in their low level of connectivity as compared to other more accessible or well-connected countries under MC, HC and VHC clusters.On the other hand, the profound insight from the empirical results of the countries grouped under VHC, is that these countries typically share common characteristics such as developing or more developed countries economically, having robust technological infrastructure, experiencing high levels of digital literacy that foster strong education systems, established global integration through trade and investment as well as openness to foreign investments, continuous urbanization efforts supported by government policies, political and stabilities.These factors collectively create an environment where connectivity is widely accessible and essential for various aspects of modern life, including business, education, healthcare and social interactions for the countries under the presently new VHC cluster.For instance, despite being a developing country, Bangladesh is categorized under VHC due to its economic strength as one of the world's top producer and exporter of garment industries since 1989 until the present year.

Conclusion
K-means, hierarchical and hierarchical k-means (hkmeans) clustering strategies are applied in this study to categorize 133 countries based on their seaport network efficiency scores.These scores were obtained from DEA and FDEA implementations with LSCI and GDP as the output variables.Four new level clusters have been introduced and they are sufficient to group all the global seaport countries considered.Hkmeans eliminates the sensitivity issue in the kvalue selection of the k-means strategy while still producing acceptably consistent results between regular and fuzzy data distributions than the hierarchical clustering strategy.Moreover, using the hkmeans strategy that combines the partitional k-means and hierarchical algorithms, the initial partitioning of the k-means strategy can be improved to generate better clustering results in terms of general data consistency and clustered data composition.
Some limitations of the study may however be addressed here: 1.The quality of clustering results depends on the availability of sufficient data.
2. If the current free and publicly accessible maritime data becomes unavailable in the future, or if variables are altered or missing, it will lead to a reduction in the dataset size.
3. Consequently, the number of seaports considered in the study will be impacted due to the reduced availability of data.
4. The finding of this study are subject to uncertainties and fluctuations of maritime data due to COVID-19 pandemic in 2020.The hkmeans newly formed clusters may change again following arising global challenges in the future with addition of new and more data.
The present work can be extended based on the existing data by employing more varieties of machine learning methods such as naive Bayes and support vector machine, supervised or unsupervised strategies.These algorithms can also be combined with other statistical techniques such as Monte Carlo and Latin Hypercube Sampling to treat random data samples while other FDEA methods based on α-level, fuzzy ranking and probability approaches can also be explored to provide variations in the FDEA results used in the clustering strategies.To ensure the cluster analysis remains adaptable to changing conditions, any possible vigilant data monitoring system that tracks external events and frequently updates the dataset to reflect environmental shifts can be constituted.Additionally, leveraging machine learning and AI technologies to automatically fine-tune clustering models in response to incoming data and external signals, as well as enhancing the analysis ability to accommodate dynamic changes and disruptions efficiently can be implemented.

4. 5 . 1
Hierarchical and k-means results for 2018-2020.The comparisons between hierarchical and k-means strategies for DEA and FDEA results are shown in Fig 10.

4 . 5 . 3 K
-means versus hkmeans for DEA and FDEA.Figs 13 and 14 demonstrate the different clusters with k-means and hkmeans clustering strategies for both DEA and FDEA seaport network efficiency datasets.The cluster plots show that a few countries have been moved to other clusters following the use of hkmeans clustering strategy with respect to the countries'